作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• •    

基于多尺度特征融合与极化自注意力机制的实时语义分割算法

  • 出版日期:2025-04-10 发布日期:2025-04-10

A Real-Time Semantic Segmentation Algorithm Based on Multi-Scale Feature Fusion and Polarized Self-Attention Mechanism

  • Online:2025-04-10 Published:2025-04-10

摘要: 实时语义分割作为计算机视觉领域的核心任务之一,在无人驾驶、交通管控等诸多方面均发挥着极为关键的作用。现有基于编码器-解码器结构的实时语义分割算法通常以牺牲分割精度的代价来达到实时性的效果,然而,这类算法为了保证实时性,其感受野通常较小,从而导致对道路场景中的大尺度物体分割效果较差。为此,本文基于编码器-解码器结构提出了一个针对道路场景的实时语义分割算法来解决这个问题。算法首先在特征提取阶段设计了一个多尺度特征融合机制,对较大尺度内的感受野特征进行有效融合,提升对大尺度物体的分割效果,然后在编码器末端融入了一个极化自注意力机制,增强大尺度感受野中的局部感知,进一步提升了大尺度物体的分割效果。算法在数据集Cityscapes与Camvid上进行了测试,实验结果表明,采用单个NVIDIA RTX 3090分别在43.5FPS和91.2FPS下取得了80.6和81.1的MIoU,获得了更高的分割精度。

Abstract: As one of the core tasks in the field of computer vision, real-time semantic segmentation plays a very critical role in many aspects such as unmanned vehicle driving and traffic control system. Existing real-time semantic segmentation algorithms based on encoder-decoder structure usually achieve real-time performance at the cost of segmentation accuracy. However, in order to ensure real-time performance, such algorithms usually have a small receptive field, which leads to poor segmentation effect on large-scale objects in road scenes. Therefore, this paper proposes a real-time semantic segmentation algorithm for road scene based on encoder-decoder structure to solve this problem. Firstly, a multi-scale feature fusion mechanism is introduced in the feature extraction stage to effectively fuse the receptive field features within a large scale and improve the segmentation effect on large-scale objects. Then a polarized self-attention mechanism is designed at the end of the encoder to enhance the local perception in the large-scale receptive fields and further improve the segmentation effect on large-scale objects. The algorithm was implemented and tested on the Cityscapes and Camvid datasets. The experimental results show that a single NVIDIA RTX 3090 can gain a MIoU of 80.6 and 81.1 at 43.5FPS and 91.2FPS respectively, achieving better segmentation accuracy.